Learning Morphology: Algorithms for the Identification of the Stem Changes

نویسنده

  • Evelin Kuusik
چکیده

The aim of the current work is to create tools' for the automatic recognition of the Estonian stem changing rules'. The main problem consists in bringing together the ,fi)rmal classification,features available to the computer and classification based on human knowledge. This paper introduces two algorithms. First, in STLearn the supervised inductive learning technique is used to find out the suitable jeatures Jor automatic recognising of the stem changes. Two stem variants" can be bounded by more than one stem change. The second algorithm is created Jor the identifjdng the whole set of rules Jor stem pairs'. Current work is a part of a project based on the open model of language [Viks94] according to which all regular and productive phenomena of the natural language are represented by different types of rules and irregular phenomena are listed in small dictionaries exception lists. This approach gives opportunity to process the regular words not listed in dictionaries new derivatives, loan-words etc. Subsystem of morphology plays the central role in processing of the morphologically complex languages as the Estonian language is. The number of possible stem variants can strongly vary in Estonian: in some inflection types there are no stem variants at all, in some of them a word can have even five different regular stem variants. Current work presents tools ~br creating a formal description of the Estonian stem changing rules, starting from the pair of the stem variants. The Concise Morphological Dictionary of the Estonian (CMD) [Viks92] serves as a bases for current work and contains over 36 000 headwords, each of them has two stem variants on the averages. The principle types of changes are the following: 1. Stem-grade changes. Stem can occur either in a strong or a weak grade; the grades are differentiated first of all by phonetic quantity (2nd or 3rd degree of quantity marked by') that may be accompanied by various sound changes enfblding the medial sounds. For instance members of the stem pair h6ive-h'~ive are distinguished only by the different phonetic quantity; in case of couple aat2e'aal2e the rewriting rule b --+ p is concurrent with the phonetic quantity change. 2. Stem-end changes. Stem can appear either as a lemmatic stem or an inflection stem; stem variants are differentiated by changes enfolding the final sounds ( e.g. 'aadel-aadli, j'alg \~bot\-j'alga, sipelgas" \ant\-sipelga). 3. Secondary changes. These changes are conditioned by the certain context arising after either the stem-end or the stemgrade change (e.g. k'uppel \dome\ --~ * k'uppli --+ k'upli). About 20 % of stems stay changeless, mostly take place the stem-end or stem-grade changes or both at the same time. Formally the recognition of the stem change rules can be reduced to the classification task with string pairs as the objects to classify and possible rules of stem changes as the classes. System has to create class descriptions from the 'available' data: characters and their belongness to the sound classes. The important demand to the classification system is the linguistical

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iterative learning identification and control for dynamic systems described by NARMAX model

A new iterative learning controller is proposed for a general unknown discrete time-varying nonlinear non-affine system represented by NARMAX (Nonlinear Autoregressive Moving Average with eXogenous inputs) model. The proposed controller is composed of an iterative learning neural identifier and an iterative learning controller. Iterative learning control and iterative learning identification ar...

متن کامل

Comprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features

Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...

متن کامل

طبقه بندی و شناسایی رخساره‌های زمین‌شناسی با استفاده از داده‌های لرزه نگاری و شبکه‌های عصبی رقابتی

Geological facies interpretation is essential for reservoir studying. The method of classification and identification seismic traces is a powerful approach for geological facies classification and distinction. Use of neural networks as classifiers is increasing in different sciences like seismic. They are computer efficient and ideal for patterns identification. They can simply learn new algori...

متن کامل

O-3: Identification and Characterization of Repopulating Spermatogonial Stem Cells from The Adult Human Testis

Background: This study was conducted to identify and characterize repopulating spermatogonial stem cells (SSCs) in the adult human testes. Materials and Methods: Testes biopsies from obstructive azoospermic patients and normal segments of human testicular tissue were used. Flow cytometry, real time PCR and immunohistochemical analysis were performed. Purified human spermatogonia were transplant...

متن کامل

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

Modeling the impact of learning environment and professor-student rapport on professional identification of medical students

Abstract: Introduction: Professional identification as a social process in order to define the individual and the professional community of the individual is affected by different environmental, individual and institutional factors. The aim of this study was to identify a model for examining the role of learning environment and professor-student rapport in predicting the professional iden...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996